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Abstract: 

Lockheed  Martin  has  developed  a  platform  independent,  scalable  and  rcconfigurable  Digital 
Processor  (DP)  infrastructure  for  use  in  multiprocessor  environments.  This  infrastructure  is  in 
use  within  the  Small  System  Processor  (SSP)  program.  This  infrastructure  provides 
communication,  data  flow,  processor/algorithm  scaling  and  configuration  flexibility.  All  aspects 
of  communication  and  processing  are  reconfigurable  without  the  need  to  recompile.  Pipeline, 
round  robin,  or  hybrid  processing  architectures  are  supported,  as  well  as  modifying  the  number 
of  processors  without  the  need  to  recompile.  This  flexibility  is  provided  by  the  use  of  text  “flow 
graph”  files,  which  describe  a  static  processor  mapping.  Multiple  flow  graphs  are  supported. 

A  non-blocking  multicast  API  is  also  provided.  This  is  used  to  distribute  the  DP  Stimulus 
messages  to  only  the  processors  that  are  required  to  participate  in  processing. 

The  communication  infrastructure  provides  an  efficient  mechanism,  which  decouples  algorithm 
development  from  the  specific  details  of  the  data  distribution.  Algorithm  data  flow  routines 
support  redistributing  data  from  M  to  N  processors  with  or  without  data  overlap  or  minimum 
block  sizes.  Also  provided  are  M  to  N  corner  turn  and  algorithm  corner  turn  routines.  Blocking 
and  Non  Blocking  API’s  are  provided. 

This  infrastructure  is  highly  portable.  The  infrastructure  was  developed  on  CSPI  2841 
multiprocessors  using  MPI  as  the  underlying  communication  API  and  VSIPL  as  the  Vector  math 
library.  Because  it  is  based  on  industry  standard  API’s,  this  infrastructure  can  be  run  on  any 
platform  that  supports  these  API’s.  This  has  been  validated  on  Server  Class  as  well  as  Embedded 
platforms.  No  change  to  code  was  made,  just  a  recompile  for  the  particular  platform. 
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Hard  Real  Time  DSP  Challenge 


Develop  a  Portable  and  Easily  Scalable  DSP 

—Portability  requires  the  use  of  Open  Architecture 
Standards 

Low  Overhead  Communication 
—Message  Passing  Interface  (MPI) 

•  Vector  Processing 

—Vector  Signal  Image  Processing  Library  (VSIPL) 

•  Programming  Language  (C++) 

—Scalability  requires: 

An  Infrastructure  which  is  highly  configurable. 

—Number  of  Processors 

—Round  Robin,  Pipeline  or  Hybrid  Data  Flow 

—Data  Redistribution  Support 

Frees  the  algorithm  designer  from  the  details  of 
the  data  distribution 
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Real  Time  DSP  Solution 


DSP  Infrastructure  Description 
—Flow  Graph 

Defines  the  Data  Flow  and  Algorithm  Mapping  to  a 
Network  of  Processors 

—Based  on  a  Static  Map  of  DSP  processors 
—Infrastructure  Supports  Multiple  Flow  Graphs 
—Text  File  Based  (Easily  Modified) 

—Loaded  during  Software  Initialization 
—Easy  to  add  algorithms  or  modify  data  flow 

-  MPI  Intercommunicators  are  formed  based  on  Flow 
Graph  information. 

Provides  Stimulus  and  Data  Flow  Paths. 
Redistribution  API  uses  the  formed  Data  Paths. 

—Infrastructure  has  been  tested  on  Server  and  Embedded 
architectures  using  more  than  64  processors. 

No  code  modification  is  needed. 

DSP  recompiled  for  the  particular  architecture. 


Flow  Graph  Details 


MPI  Stimulus  and  Data  flow  Communication  paths  are 
formed  based  on  information  read  from  text  Flow_Graph 
files  during  initialization. 


Example  DSP  Flow  Graph 
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Flow  Graph  Communicators 

Resulting  Stim  and  Data  Communication  Paths 


Radar  Control 


Pipeline  Flow  Graph 


Pipeline  DSP  Flow  Graph 
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Resulting  Control  and  Data  Communication  Paths 
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Hybrid  Flow  Graph 


Hybrid  DSP  Flow  Graph 
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Hybrid  Processing 

Resulting  Stim  and  Data  Communication  Paths 
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□  MPI  Processor 
-►  Data  Path 


Hybrid  Processing 

*  4  Distinct  Communication  paths  are  formed. 

*  A  path  is  used  per  Radar  Sequence.  (ROUND  ROBIN) 

*  Stim  distributor  determines  which  Comm  path  is  in  use. 

*  Stimulus  is  only  distributed  to  processors  that  need  it. 


Multiple  Flow  Graphs 
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Flow  Graph  Performance  Comparison 


Data  Redistribution 


Data  Flow  paths  are  used  for  Redistribution 
Data  Redistribution  Calls  are  Layered  on  MPI 
Provide  2D->2D  with  Overlap  and  Modulo  support 
Insulates  Algorithm  from  Redistribution 


Algorithm  Pseudo  Code  Fragment 


//  Data  input  scalable  across  processors 

//  Receive  Input  Data 

blocks  =  Redist(  Buffer,  14,  23, 1,0); 

//  Perform  algorithm  on  received  data 
for(  int  i=0;  kblocks;  i++) 

{ 

vsip_ccfftop_f(...); 

//  Data  output  scalable  across  processors 

//  Send  Output  Data 

blocks  =  Redistf  Buffer,  14,  32, 1,0); 

Data  Input  From  2 
Processors  to  3 

Processors 

VSI PL  pro  vides  a  x\ 

Platform  independent 

API  // 

Data  Output  From  3 
Processors  to  1 

Processor 

Developer  can  Concentrate  on  Algorithm  Implementation 


Data  Redistribution 

Without  Overlap 


Data  flow  Communication  paths  are  used  for  Redistribution 
Data  Redistribution  Calls  are  Layered  on  MPI 
Provide  2D->2D  with  Overlap  and  Modulo  support 
Insulates  Algorithm  from  Redistribution 


Redistf  Data_Buffer,  Splitable  Dimension,  Unsplitable 
Dimension,  Modulo, Overlap); 

—Redistf  Buffer,  14,  6,  1,  0);  -Application  Redistribution  Call 


2  processors 


3  processors 


Redistribute 


Data  Buffer  1 4X6 


Data  Buffer  1 4X6 


Data  Redistribution 

With  Overlap 


Redistf  Data_Buffer,  Splitable  Dimension,  Unsplitable 
Dimension,  Modulo,  Overlap); 

The  Same  Call  is  Made  by  all  5  Processors 

Redist(  Buffer,  14,  6,  1,  1);  -Application  Redistribution  Call 
With  Overlap  1 


2  processors 


Redistribute 


3  processors 


Overlapped  Data 


Data  Buffer  1 4X6 


Data  Buffer  1 4X6 


Data  Redistribution 

With  Modulo 


Redist(  Data_Buffer,  Splitable  Dimension,  Unsplitable 
Dimension,  Modulo,  Overlap); 

The  Same  Call  is  Made  by  all  5  Processors 

Redistf  Buffer,  14,  6,  4,  0);  -Application  Redistribution  Call 
With  Modulo  4 


2  processors 


3  processors 


4  Blocks  4  Blocks  4  Blocks+Remainder 


Data  Buffer  1 4X6 


Data  Buffer  1 4X6 


Matrix  Transpose 


Ct_Transfer(  Data_Buffer,  Splitable  Dimension,  Unsplitable 
Dimension,  Modulo); 

The  Same  Call  is  Made  by  all  5  Processors 
Ct_Transfer(Buffer,  14,  6,  1);  -  Application  Matrix  Transpose 


2  processors  3  processors 


Data  Buffer  1 4X6 
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DSP  Infrastructure: 

—Supports  Real-Time  High-Performance  Embedded  Radar 
Applications 

•  Low  Overhead 
Scalable  to  requirements 

—Built  on  Open  Architecture  Standards 

•  MPI  and  VSIPL 

—Reduces  development  costs 

Scalable  to  applications  with  minimal  changes  to 
software 

—Provides  for  Platform  Independence 

—Provides  DSP  Lifecycle  Support 

Scale  DSP  from  Development  to  Delivery  Without 
Code  Modifications 

Add  Algorithms  with  Minimal  Software  Changes 
Reusable  Infrastructure  and  Algorithms 
Easily  Scale  DSP  for  Various  Deployments 


